107 research outputs found
On the Limitations of Provenance for Queries With Difference
The annotation of the results of database transformations was shown to be
very effective for various applications. Until recently, most works in this
context focused on positive query languages. The provenance semirings is a
particular approach that was proven effective for these languages, and it was
shown that when propagating provenance with semirings, the expected equivalence
axioms of the corresponding query languages are satisfied. There have been
several attempts to extend the framework to account for relational algebra
queries with difference. We show here that these suggestions fail to satisfy
some expected equivalence axioms (that in particular hold for queries on
"standard" set and bag databases). Interestingly, we show that this is not a
pitfall of these particular attempts, but rather every such attempt is bound to
fail in satisfying these axioms, for some semirings. Finally, we show
particular semirings for which an extension for supporting difference is
(im)possible.Comment: TAPP 201
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
Computing Possible and Certain Answers over Order-Incomplete Data
This paper studies the complexity of query evaluation for databases whose
relations are partially ordered; the problem commonly arises when combining or
transforming ordered data from multiple sources. We focus on queries in a
useful fragment of SQL, namely positive relational algebra with aggregates,
whose bag semantics we extend to the partially ordered setting. Our semantics
leads to the study of two main computational problems: the possibility and
certainty of query answers. We show that these problems are respectively
NP-complete and coNP-complete, but identify tractable cases depending on the
query operators or input partial orders. We further introduce a duplicate
elimination operator and study its effect on the complexity results.Comment: 55 pages, 56 references. Extended journal version of
arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theoretical Computer Scienc
Just in Time: Personal Temporal Insights for Altering Model Decisions
The interpretability of complex Machine Learning models is coming to be a
critical social concern, as they are increasingly used in human-related
decision-making processes such as resume filtering or loan applications.
Individuals receiving an undesired classification are likely to call for an
explanation -- preferably one that specifies what they should do in order to
alter that decision when they reapply in the future. Existing work focuses on a
single ML model and a single point in time, whereas in practice, both models
and data evolve over time: an explanation for an application rejection in 2018
may be irrelevant in 2019 since in the meantime both the model and the
applicant's data can change. To this end, we propose a novel framework that
provides users with insights and plans for changing their classification in
particular future time points. The solution is based on combining
state-of-the-art algorithms for (single) model explanations, ones for
predicting future models, and database-style querying of the obtained
explanations. We propose to demonstrate the usefulness of our solution in the
context of loan applications, and interactively engage the audience in
computing and viewing suggestions tailored for applicants based on their unique
characteristic
- …